music domain
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
MuCPT: Music-related Natural Language Model Continued Pretraining
Tian, Kai, Mao, Yirong, Bi, Wendong, Wang, Hanjie, Wenhui, Que
Large language models perform strongly on general tasks but remain constrained in specialized settings such as music, particularly in the music-entertainment domain, where corpus scale, purity, and the match between data and training objectives are critical. We address this by constructing a large, music-related natural language corpus (40B tokens) that combines open source and in-house data, and by implementing a domain-first data pipeline: a lightweight classifier filters and weights in-domain text, followed by multi-stage cleaning, de-duplication, and privacy-preserving masking. We further integrate multi-source music text with associated metadata to form a broader, better-structured foundation of domain knowledge. On the training side, we introduce reference-model (RM)-based token-level soft scoring for quality control: a unified loss-ratio criterion is used both for data selection and for dynamic down-weighting during optimization, reducing noise gradients and amplifying task-aligned signals, thereby enabling more effective music-domain continued pretraining and alignment. To assess factuality, we design the MusicSimpleQA benchmark, which adopts short, single-answer prompts with automated agreement scoring. Beyond the benchmark design, we conduct systematic comparisons along the axes of data composition. Overall, this work advances both the right corpus and the right objective, offering a scalable data-training framework and a reusable evaluation tool for building domain LLMs in the music field.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Checklist
The checklist follows the references. For example: Did you include the license to the code and datasets? Please do not modify the questions and only use the provided macros for your answers. Checklist section does not count towards the page limit. Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work?
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
MusGO: A Community-Driven Framework For Assessing Openness in Music-Generative AI
Batlle-Roca, Roser, Ibáñez-Martínez, Laura, Serra, Xavier, Gómez, Emilia, Rocamora, Martín
Since 2023, generative AI has rapidly advanced in the music domain. Despite significant technological advancements, music-generative models raise critical ethical challenges, including a lack of transparency and accountability, along with risks such as the replication of artists' works, which highlights the importance of fostering openness. With upcoming regulations such as the EU AI Act encouraging open models, many generative models are being released labelled as 'open'. However, the definition of an open model remains widely debated. In this article, we adapt a recently proposed evidence-based framework for assessing openness in LLMs to the music domain. Using feedback from a survey of 110 participants from the Music Information Retrieval (MIR) community, we refine the framework into MusGO (Music-Generative Open AI), which comprises 13 openness categories: 8 essential and 5 desirable. We evaluate 16 state-of-the-art generative models and provide an openness leaderboard that is fully open to public scrutiny and community contributions. Through this work, we aim to clarify the concept of openness in music-generative AI and promote its transparent and responsible development.
- North America > United States (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Spain > Andalusia > Seville Province > Seville (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- (2 more...)
A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection
Hachmeier, Simon, Jäschke, Robert
Detecting music entities such as song titles or artist names is a useful application to help use cases like processing music search queries or analyzing music consumption on the web. Recent approaches incorporate smaller language models (SLMs) like BERT and achieve high results. However, further research indicates a high influence of entity exposure during pre-training on the performance of the models. With the advent of large language models (LLMs), these outperform SLMs in a variety of downstream tasks. However, researchers are still divided if this is applicable to tasks like entity detection in texts due to issues like hallucination. In this paper, we provide a novel dataset of user-generated metadata and conduct a benchmark and a robustness study using recent LLMs with in-context-learning (ICL). Our results indicate that LLMs in the ICL setting yield higher performance than SLMs. We further uncover the large impact of entity exposure on the best performing LLM in our study.
- North America > United States > New York (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (4 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Resolving Indirect Referring Expressions for Entity Selection
Hosseini, Mohammad Javad, Radlinski, Filip, Pareti, Silvia, Louis, Annie
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a dialog participant may be indirect: `let's make the green one'. Such natural expressions have been little studied for reference resolution. We argue that robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of 42K entity pairs and expressions (referring to one entity in the pair), and develop models for the disambiguation problem. Consisting of indirect referring expressions across three domains, our corpus enables for the first time the study of how language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (10 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)